main character
OregairuChar: A Benchmark Dataset for Character Appearance Frequency Analysis in My Teen Romantic Comedy SNAFU
Sun, Qi, Zhou, Dingju, Zhang, Lina
The analysis of character appearance frequency is essential for understanding narrative structure, character prominence, and story progression in anime. In this work, we introduce OregairuChar, a benchmark dataset designed for appearance frequency analysis in the anime series My Teen Romantic Comedy SNAFU. The dataset comprises 1600 manually selected frames from the third season, annotated with 2860 bounding boxes across 11 main characters. OregairuChar captures diverse visual challenges, including occlusion, pose variation, and inter-character similarity, providing a realistic basis for appearance-based studies. To enable quantitative research, we benchmark several object detection models on the dataset and leverage their predictions for fine-grained, episode-level analysis of character presence over time. This approach reveals patterns of character prominence and their evolution within the narrative. By emphasizing appearance frequency, OregairuChar serves as a valuable resource for exploring computational narrative dynamics and character-centric storytelling in stylized media.
AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models
Li, Wenyu, Jiao, Xiaoqi, Chang, Yi, Zhang, Guangyan, Guo, Yiwen
The creation of high-quality multimodal datasets remains fundamental for advancing role-playing capabilities in large language models (LLMs). While existing works predominantly focus on text-based persona simulation, Audio Role-Playing (ARP) presents unique challenges due to the need for synchronized alignment of semantic content and vocal characteristics. To address this gap, we propose AudioRole, a meticulously curated dataset from 13 TV series spanning 1K+ hours with 1M+ character-grounded dialogues, providing synchronized audio-text pairs annotated with speaker identities and contextual metadata. In addition, to demonstrate the effectiveness of the dataset, we introduced ARP-Eval, a dual-aspect evaluation framework that assesses both response quality and role fidelity. Empirical validation showing GLM-4-Voice trained on AudioRole (which we called ARP-Model) achieve an average Acoustic Personalization score of 0.31, significantly outperforming the original GLM-4-voice and the more powerful model MiniCPM-O-2.6, which specifically supports role-playing in one-shot scenarios. The ARP-Model also achieves a Content Personalization score of 0.36, surpassing the untrained original model by about 38% and maintaining the same level as MiniCPM-O-2.6. AudioRole features dialogues from over 115 main characters, 6 trained ARP-Models that role-play different characters, and evaluation protocols. Together, they provide an essential resource for advancing audio-grounded role-playing research.
- North America > United States (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Media > Television (1.00)
- Leisure & Entertainment (1.00)
20 books by female authors for Women's History Month
These authors made history with their powerful books. March is Women's History Month, a time dedicated to honoring the powerful, inspiring and trailblazing women who have contributed amazing things to our world. What better way to celebrate this month than by diving into books written by women? Female authors have written a diverse range of books, from novels to memoirs, to science fiction and horror. Get your bookmarks ready and prepare to be captivated by these must-read books for Women's History Month. Follow an eccentric artist and her daughter through this short novel.
- North America > United States > Alaska (0.05)
- Asia > Vietnam (0.05)
- North America > United States > New York (0.05)
- Europe > France (0.05)
Universal Narrative Model: an Author-centric Storytelling Framework for Generative AI
In their survey of authoring tools for computational narrative, Kybartas and Bidarra note that "we believe that creating a standard model of computational narrative could allow different systems to interact with the same narrative, without being restricted by incompatible models and definitions. Furthermore, such a model would also facilitate research into the generation of specific story components, e.g., allowing for multiple generators and even authors to collaborate on a given narrative" [Kybartas and Bidarra [2017]]. This paper proposes such a standard: the Universal Narrative Model (UNM). We foresee that generative AI will enable a new paradigm of storytelling technologies and processes: from assisting a writer of linear media (novels, film, television, etc.) by allowing them to test out scenes and characters before committing them to a script, all the way through to real-time storytelling systems in videogames which respond to a player's agency, and countless use cases in between [Peng et al. [2024]]. The UNM is designed to service any use case in which coherent narrative structure is a consideration, and in which authorial intent and direction is privileged. In the last five years, a robust body of research has demonstrated a wide variety of potential uses for computational narrative systems powered by generative AI, and some limited commercial deployments already exist [Yang et al. [2024], Hu et al. [2024]]. With such promise, however, comes a series of challenges: technical, narrative, and ethical. The goal of the Entertainment Technology Center's "Universal Narrative Model" project was to produce the UNM as an open standard. The ultimate directive of the project was to privilege, above all else, author-centric design and functionality, setting the stage for generative workflows which extend an author's narrative intent and creativity, rather than eclipse or replace it.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Which books do I like?
Rosenbusch, Hannes, Meral, Erdem Ozan
Finding enjoyable fiction books can be challenging, partly because stories are multi-faceted and one's own literary taste might be difficult to ascertain. Here, we introduce the ISAAC method (Introspection-Support, AI-Annotation, and Curation), a pipeline which supports fiction readers in gaining awareness of their literary preferences and finding enjoyable books. ISAAC consists of four steps: a user supplies book ratings, an AI agent researches and annotates the provided books, patterns in book enjoyment are reviewed by the user, and the AI agent recommends new books. In this proof-of-concept self-study, the authors test whether ISAAC can highlight idiosyncratic patterns in their book enjoyment, spark a deeper reflection about their literary tastes, and make accurate, personalized recommendations of enjoyable books and underexplored literary niches. Results highlight substantial advantages of ISAAC over existing methods such as an integration of automation and intuition, accurate and customizable annotations, and explainable book recommendations. Observed disadvantages are that ISAAC's outputs can elicit false self-narratives (if statistical patterns are taken at face value), that books cannot be annotated if their online documentation is lacking, and that people who are new to reading have to rely on assumed book ratings or movie ratings to power the ISAAC pipeline. We discuss additional opportunities of ISAAC-style book annotations for the study of literary trends, and the scientific classification of books and readers.
$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
Santos, Saul, Farinhas, António, McNamee, Daniel C., Martins, André F. T.
Current video-language models struggle with long-video understanding due to limited context lengths and reliance on sparse frame subsampling, often leading to information loss. This paper introduces $\infty$-Video, which can process arbitrarily long videos through a continuous-time long-term memory (LTM) consolidation mechanism. Our framework augments video Q-formers by allowing them to process unbounded video contexts efficiently and without requiring additional training. Through continuous attention, our approach dynamically allocates higher granularity to the most relevant video segments, forming "sticky" memories that evolve over time. Experiments with Video-LLaMA and VideoChat2 demonstrate improved performance in video question-answering tasks, showcasing the potential of continuous-time LTM mechanisms to enable scalable and training-free comprehension of long videos.
- Europe > Portugal > Lisbon > Lisbon (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Monaco (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness
Zhao, Jiaxing, Sun, Boyuan, Chen, Xiang, Wei, Xihan
Facial expression captioning has found widespread application across various domains. Recently, the emergence of video Multimodal Large Language Models (MLLMs) has shown promise in general video understanding tasks. However, describing facial expressions within videos poses two major challenges for these models: (1) the lack of adequate datasets and benchmarks, and (2) the limited visual token capacity of video MLLMs. To address these issues, this paper introduces a new instruction-following dataset tailored for dynamic facial expression caption. The dataset comprises 5,033 high-quality video clips annotated manually, containing over 700,000 tokens. Its purpose is to improve the capability of video MLLMs to discern subtle facial nuances. Furthermore, we propose FaceTrack-MM, which leverages a limited number of tokens to encode the main character's face. This model demonstrates superior performance in tracking faces and focusing on the facial expressions of the main characters, even in intricate multi-person scenarios. Additionally, we introduce a novel evaluation metric combining event extraction, relation classification, and the longest common subsequence (LCS) algorithm to assess the content consistency and temporal sequence consistency of generated text. Moreover, we present FEC-Bench, a benchmark designed to assess the performance of existing video MLLMs in this specific task. All data and source code will be made publicly available.
M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs
Yan, Bei, Zhang, Jie, Chen, Zhiyuan, Shan, Shiguang, Chen, Xilin
Recently, large foundation models, including large language models (LLMs) and large vision-language models (LVLMs), have become essential tools in critical fields such as law, finance, and healthcare. As these models increasingly integrate into our daily life, it is necessary to conduct moral evaluation to ensure that their outputs align with human values and remain within moral boundaries. Previous works primarily focus on LLMs, proposing moral datasets and benchmarks limited to text modality. However, given the rapid development of LVLMs, there is still a lack of multimodal moral evaluation methods. To bridge this gap, we introduce M$^3$oralBench, the first MultiModal Moral Benchmark for LVLMs. M$^3$oralBench expands the everyday moral scenarios in Moral Foundations Vignettes (MFVs) and employs the text-to-image diffusion model, SD3.0, to create corresponding scenario images. It conducts moral evaluation across six moral foundations of Moral Foundations Theory (MFT) and encompasses tasks in moral judgement, moral classification, and moral response, providing a comprehensive assessment of model performance in multimodal moral understanding and reasoning. Extensive experiments on 10 popular open-source and closed-source LVLMs demonstrate that M$^3$oralBench is a challenging benchmark, exposing notable moral limitations in current models. Our benchmark is publicly available.
Memorization Over Reasoning? Exposing and Mitigating Verbatim Memorization in Large Language Models' Character Understanding Evaluation
Jiang, Yuxuan, Ferraro, Francis
Recently, Large Language Models (LLMs) have shown impressive performance in character understanding tasks, such as analyzing the roles, personalities, and relationships of fictional characters. However, the extensive pre-training corpora used by LLMs raise concerns that they may rely on memorizing popular fictional works rather than genuinely understanding and reasoning about them. In this work, we argue that 'gist memory'-capturing essential meaning - should be the primary mechanism for character understanding tasks, as opposed to 'verbatim memory' - exact match of a string. We introduce a simple yet effective method to mitigate mechanized memorization in character understanding evaluations while preserving the essential implicit cues needed for comprehension and reasoning. Our approach reduces memorization-driven performance on popular fictional works from 96% accuracy to 72% and results in up to an 18% drop in accuracy across various character understanding tasks. These findings underscore the issue of data contamination in existing benchmarks, which often measure memorization rather than true character understanding.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (8 more...)
- Leisure & Entertainment (1.00)
- Media > Television (0.48)
- Media > Film (0.46)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models
Huang, Runsheng "Anson", Martin, Lara J., Callison-Burch, Chris
WHAT-IF--Writing a Hero's Alternate Timeline through Interactive Fiction--is a system that uses zero-shot meta-prompting to create branching narratives from a prewritten story. Played as an interactive fiction (IF) game, WHAT-IF lets the player choose between decisions that the large language model (LLM) GPT-4 generates as possible branches in the story. Starting with an existing linear plot as input, a branch is created at each key decision taken by the main character. By meta-prompting the LLM to consider the major plot points from the story, the system produces coherent and well-structured alternate storylines. WHAT-IF stores the branching plot tree in a graph which helps it to both keep track of the story for prompting and maintain the structure for the final IF system. Figure 1: The WHAT-IF user interface, filled with the A video demo of our system can be found here: main character, title, and the plot of the TV show WandaVision https://youtu.be/8vBqjqtupcc.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Afghanistan (0.05)
- North America > United States > Pennsylvania (0.04)
- (16 more...)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Media (0.87)